Goto

Collaborating Authors

 bastien bubeck



Bounded Regret for Finite-Armed Structured Bandits

Neural Information Processing Systems

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problemdependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.


Best-Arm Identification in Linear Bandits

Neural Information Processing Systems

We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In particular, we show the importance of exploiting the global linear structure to improve the estimate of the reward of near-optimal arms. We analyze the proposed strategies and compare their empirical performance. Finally, as a by-product of our analysis, we point out the connection to the G-optimality criterion used in optimal experimental design.


Best-Arm Identification in Linear Bandits

Neural Information Processing Systems

We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In particular, we show the importance of exploiting the global linear structure to improve the estimate of the reward of near-optimal arms. We analyze the proposed strategies and compare their empirical performance. Finally, as a by-product of our analysis, we point out the connection to the G-optimality criterion used in optimal experimental design.


Bounded Regret for Finite-Armed Structured Bandits

Neural Information Processing Systems

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problemdependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.


053r - Chasing convex bodies and other random topics with Dr. Sébastien Bubeck

#artificialintelligence

Dr. Sébastien Bubeck is a mathematician and a senior researcher in the Machine Learning and Optimization group at Microsoft Research. He's also a self-proclaimed "bandit" who claims that, despite all the buzz around AI, it's still a science in its infancy. That's why he's devoted his career to advancing the mathematical foundations behind the machine learning algorithms behind AI. Today, Dr. Bubeck explains the difficulty of the multi-armed bandit problem in the context of a parameter- and data-rich online world. He also discusses a host of topics from randomness and convex optimization to metrical task systems and log n competitiveness to the surprising connection between Gaussian kernels and what he calls some of the most beautiful objects in mathematics.


Best-Arm Identification in Linear Bandits

Neural Information Processing Systems

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter $\theta^*$ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In particular, we show the importance of exploiting the global linear structure to improve the estimate of the reward of near-optimal arms. We analyze the proposed strategies and compare their empirical performance. Finally, as a by-product of our analysis, we point out the connection to the $G$-optimality criterion used in optimal experimental design.


Bounded Regret for Finite-Armed Structured Bandits

Neural Information Processing Systems

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problem-dependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.